Combining evidence from a generative and a discriminative model in phoneme recognition

نویسندگان

  • Joel Pinto
  • Hynek Hermansky
چکیده

We investigate the use of the log-likelihood of the features obtained from a generative Gaussian mixture model, and the posterior probability of phonemes from a discriminative multilayered perceptron in multi-stream combination for recognition of phonemes. Multi-stream combination techniques, namely early integration and late integration are used to combine the evidence from these models. By using multi-stream combination, we obtain a phoneme recognition accuracy of 74% on the standard TIMIT database, an absolute improvement of 2.5% over the single best stream.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Allophone-based acoustic modeling for Persian phoneme recognition

Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...

متن کامل

Automatic Social Role Recognition in Professional Meetings

This paper investigates the influence of social roles on the conversation style and linguistic usage of participants in professional meeting recordings. At first, we implement a generative model to capture the sequential nature of conversations in terms of participants, turntaking behavior. In parallel, the system also employs a probabilistic discriminative classifier on a set of high level fea...

متن کامل

Improving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM

Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...

متن کامل

Error Pattern Detection Integrating Generative and Discriminative Learning for Computer-Aided Pronunciation Training

Computer-Assisted Language Learning tries to have computers serve as virtual language tutors to help people in learning non-native languages in the globalized world nowadays. In this paper we propose a framework to incorporate specially designed discriminative models with carefully trained generative models for the task of pronunciation error pattern detection. For each phoneme we train one or ...

متن کامل

Combining Evidence from Unconstrained Spoken Term Frequency Estimation for Improved Speech Retrieval

Title of dissertation: Combining Evidence from Unconstrained Spoken Term Frequency Estimation for Improved Speech Retrieval J. Scott Olsson, Doctor of Philosophy, 2008 Dissertation directed by: Associate Professor Douglas W. Oard College of Information Studies This dissertation considers the problem of information retrieval in speech. Today’s speech retrieval systems generally use a large vocab...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008